BUG: fix DataFrame.getitem and .loc with non-list listlikes #21313

toobaz · 2018-06-04T13:49:12Z

closes Dict/dict keys in DataFrame.__getitem__ #21294
closes DataFrame[np.nan] raises TypeError with non-unique columns #21428
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

xref #21309 , but worth fixing separately

jreback · 2018-06-05T10:31:31Z

this would have to be for 0.24 (not convinced we should do this)

toobaz · 2018-06-05T12:13:32Z

not convinced we should do this

Do you think we should disable listlikes in other cases? Or you don't find the discrepancy problematic?

pep8speaks · 2018-06-11T20:36:58Z

Hello @toobaz! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 07, 2018 at 08:11 Hours UTC

codecov · 2018-06-11T22:54:09Z

Codecov Report

Merging #21313 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21313      +/-   ##
==========================================
- Coverage   91.95%   91.95%   -0.01%     
==========================================
  Files         160      160              
  Lines       49820    49818       -2     
==========================================
- Hits        45812    45809       -3     
- Misses       4008     4009       +1

Flag	Coverage Δ
#multiple	`90.33% <100%> (-0.01%)`	⬇️
#single	`42.07% <75.67%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.19% <100%> (ø)`	⬆️
pandas/core/sparse/frame.py	`94.78% <100%> (-0.06%)`	⬇️
pandas/core/internals.py	`95.46% <0%> (-0.08%)`	⬇️
pandas/core/indexes/category.py	`97.28% <0%> (+0.27%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 620abc4...5bd6eb8. Read the comment docs.

jreback · 2018-06-11T23:57:11Z

doc/source/whatsnew/v0.23.1.txt

@@ -110,6 +110,8 @@ Bug Fixes
 - Bug in :meth:`Series.reset_index` where appropriate error was not raised with an invalid level name (:issue:`20925`)
 - Bug in :func:`interval_range` when ``start``/``periods`` or ``end``/``periods`` are specified with float ``start`` or ``end`` (:issue:`21161`)
 - Bug in :meth:`MultiIndex.set_names` where error raised for a ``MultiIndex`` with ``nlevels == 1`` (:issue:`21149`)
+- Bug in :meth:`DataFrame.__getitem__` and :meth:`DataFrame.loc` which did not accept columns keys passed as non-list iterables (:issue:`21294`)


would be for 0.24

jreback · 2018-06-11T23:57:50Z

pandas/core/frame.py

+                if self.columns.nlevels > 1:
+                    return self._getitem_multilevel(key)
+                return self._get_item_cache(key)
+        except (ValueError, TypeError):
            pass


what hits this exception here?

Lots of cases, for instance pd.DataFrame(index=range(3), columns=range(3))[['a', 'b']]

so a list-like key?

Yes, I guess so

jreback · 2018-06-11T23:59:12Z

pandas/core/frame.py

-            return self._get_item_cache(key)
+        # We are left with two options: a single key, and a collection of keys,
+        # We interpret tuples as collections only for non-MultiIndex
+        coll_key = is_list_like(key) and (not isinstance(key, tuple) or


can you take the inverse and name this is_single_key

a single tuple is a single key, yes? (MI or no)?

a single tuple is a single key, yes? (MI or no)?

Right, I was assuming tuples as collections were allowed, luckily they weren't

jreback · 2018-06-12T00:00:04Z

pandas/core/frame.py


-    def _getitem_array(self, key):
+        if not coll_key:
+            # This test preserves #9519; the second part preserves #21309


can you give a more informative comment

jreback · 2018-06-12T00:00:47Z

pandas/core/frame.py

+        elif len(key) != len(self.index):
+            raise ValueError('Item wrong length %d instead of %d.' %
+                             (len(key), len(self.index)))
+        # check_bool_indexer will throw exception if Series key cannot


blank line here

jreback · 2018-06-12T00:01:16Z

pandas/tests/frame/test_constructors.py

@@ -501,9 +501,11 @@ def test_constructor_dict_of_tuples(self):
        tm.assert_frame_equal(result, expected, check_dtype=False)

    def test_constructor_dict_multiindex(self):
-        check = lambda result, expected: tm.assert_frame_equal(


can you parameterize this test?

Not here, but I can avoid fixing the lambda

jreback · 2018-06-12T00:01:48Z

pandas/tests/frame/test_constructors.py

            indexer = np.arange(len(df.columns))[isna(df.columns)]

-            if len(indexer) == 1:


comment on each of these cases

jreback · 2018-06-12T00:02:01Z

pandas/tests/frame/test_constructors.py

                tm.assert_series_equal(df.iloc[:, indexer[0]],
                                       df.loc[:, np.nan])

-            # multiple nans should fail
+            # multiple nans should result in DataFrame


toobaz · 2018-06-12T09:01:57Z

@jreback ready for me

jorisvandenbossche · 2018-06-12T17:07:01Z

I have my doubts we should do this, commented on the relevant issue: #21294

toobaz · 2018-06-30T15:02:22Z

Discussion (I think) concluded, conflicts fixed by rebasing... ready for me. Objections @jreback ?

jreback

@toobaz I had some comments that I didn't click on. and can you rebase on master.

jreback · 2018-06-13T21:23:22Z

pandas/core/frame.py

+            if self.columns.is_unique and key in self.columns:
+                if self.columns.nlevels > 1:
+                    return self._getitem_multilevel(key)
+                return self._get_item_cache(key)


why are you changing to directly use
_get_item_cache here rather than _getitem_column? (is it removed)?

Yes, removed. It did a uniqueness test which is no more necessary, and was misleading anyway, as it could not really manage all cases in which a single column is returned.

jreback · 2018-06-13T21:25:01Z

pandas/core/frame.py

+                if self.columns.nlevels > 1:
+                    return self._getitem_multilevel(key)
+                return self._get_item_cache(key)
+        except (ValueError, TypeError):
            pass


so a list-like key?

jreback · 2018-06-13T21:25:58Z

pandas/core/frame.py

        indexer = convert_to_index_sliceable(self, key)
        if indexer is not None:
-            return self._getitem_slice(indexer)


why the change here?

removed?

Yes, removed, it was a pretty useless one-liner

jreback · 2018-06-13T21:27:23Z

pandas/core/frame.py

+        is_single_key = isinstance(key, tuple) or not is_list_like(key)
+
+        if is_single_key:
+            if self.columns.nlevels > 1:


isn’t this case handled by _getitem_multilevel (above)?

only if columns.is_unique

jreback · 2018-06-13T21:28:50Z

pandas/core/frame.py

+            if self.columns.nlevels > 1:
+                return self._getitem_multilevel(key)
+            indexer = self.columns.get_loc(key)
+            if is_integer(indexer):


this is an argument for _take to accept a scalar integer

Not too sure. _take is such a fundamental method that there might be good reasons to keep it simple. Anyway, we can discuss this (in some other issue).

moreover, the problem should disappear when we fix #9519 , that is, when the return type becomes predictable from the index (non-)uniqueness

toobaz · 2018-07-04T20:39:47Z

@jreback added a comment clarifying the except clause, rebased, ready for me

jreback

ok lgtm. can you add a whatsnew, maybe needs a subsection.

close pandas-dev#21294 close pandas-dev#21428

toobaz · 2018-07-07T08:12:53Z

ok lgtm. can you add a whatsnew, maybe needs a subsection.

I don't think a subsection is worth putting in the whatsnew - because the change is relatively marginal. But I do think we'll need to clarify list-likes in general in the docs - see #21784

jreback · 2018-07-07T14:25:03Z

thanks @toobaz nice!

…das-dev#21313) * BUG: fix DataFrame.__getitem__ and .loc with non-list listlikes close pandas-dev#21294 close pandas-dev#21428

gfyoung added the Indexing Related to indexing on series/frames, not to indexes themselves label Jun 6, 2018

toobaz force-pushed the df_getitem_21294 branch from 34647a8 to f56c72f Compare June 11, 2018 20:36

toobaz force-pushed the df_getitem_21294 branch from f56c72f to 445dcdf Compare June 11, 2018 20:54

toobaz mentioned this pull request Jun 11, 2018

RLS: 0.23.1 #21312

Closed

jreback requested changes Jun 12, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Jun 12, 2018

toobaz force-pushed the df_getitem_21294 branch from 445dcdf to ba6c074 Compare June 12, 2018 05:55

toobaz mentioned this pull request Jun 13, 2018

Dict/dict keys in DataFrame.__getitem__ #21294

Closed

toobaz force-pushed the df_getitem_21294 branch from ba6c074 to 649df47 Compare June 30, 2018 14:10

jreback requested changes Jul 3, 2018

View reviewed changes

toobaz force-pushed the df_getitem_21294 branch from 649df47 to 0eb16fd Compare July 3, 2018 23:56

toobaz mentioned this pull request Jul 4, 2018

pd.Categorical.contains([item, otheritem]) raises ValueError #21729

Closed

toobaz force-pushed the df_getitem_21294 branch from 0eb16fd to 9c57fd4 Compare July 4, 2018 16:31

jreback requested changes Jul 5, 2018

View reviewed changes

toobaz added 5 commits July 7, 2018 10:10

BUG: fix DataFrame.__getitem__ and .loc with non-list listlikes

2e63f5b

close pandas-dev#21294 close pandas-dev#21428

CLN: change lambdas to normal function definitions

c2dbc40

CLN: remove unnecessary __getitem__ override

1d98cc7

CLN: replace lambdas with defs

837104b

TST: fixed test for multiple NaNs in index

5bd6eb8

toobaz force-pushed the df_getitem_21294 branch from 9c57fd4 to 5bd6eb8 Compare July 7, 2018 08:10

jreback approved these changes Jul 7, 2018

View reviewed changes

jreback merged commit 32ee973 into pandas-dev:master Jul 7, 2018

toobaz deleted the df_getitem_21294 branch July 8, 2018 08:24

CJStadler mentioned this pull request Jun 20, 2019

Error running calculate feature matrix alteryx/featuretools#614

Closed

		indexer = np.arange(len(df.columns))[isna(df.columns)]

		if len(indexer) == 1:

BUG: fix DataFrame.__getitem__ and .loc with non-list listlikes #21313

BUG: fix DataFrame.__getitem__ and .loc with non-list listlikes #21313

Conversation

toobaz commented Jun 4, 2018 • edited Loading

jreback commented Jun 5, 2018

toobaz commented Jun 5, 2018

pep8speaks commented Jun 11, 2018 • edited Loading

Comment last updated on July 07, 2018 at 08:11 Hours UTC

codecov bot commented Jun 11, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toobaz commented Jun 12, 2018

jorisvandenbossche commented Jun 12, 2018

toobaz commented Jun 30, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toobaz commented Jul 4, 2018

jreback left a comment

Choose a reason for hiding this comment

toobaz commented Jul 7, 2018

jreback commented Jul 7, 2018

BUG: fix DataFrame.getitem and .loc with non-list listlikes #21313

BUG: fix DataFrame.getitem and .loc with non-list listlikes #21313

toobaz commented Jun 4, 2018 •

edited

Loading

pep8speaks commented Jun 11, 2018 •

edited

Loading

codecov bot commented Jun 11, 2018 •

edited

Loading